AFCON 2O24 MATCH ANALYSIS: A data driven exploration of team performance and scoring trends across both halves of a game¶
This project takes a closer look at the performance of teams during the two halves of the game. Scroing trends of teams and results in both halves will be analysed, as well as match outcomes.
This project provides an in-depth analysis of the Africa Cup of Nations (AFCON) football tournament, focusing on team performance, scoring patterns, and match outcomes. Using data sourced through web scraping from sofascore, this project aims to identify trends in goals scored across both halves, assess team performance in different match phases, and explore factors contributing to match outcomes.
The dataset includes detailed information on match dates, team names, goals scored by each team in the first and second halves, first and second half results and the full-time results. Additional data cleaning steps were conducted using Microsoft Excel for accurate score correction and to ensure data integrity.
With a focus on delivering valuable insights, the project showcases interactive and visually engaging data representations using Plotly in Jupyter Notebook.
Data Collection¶
In this section, we detail the process used to collect data for the Africa Cup of Nations (AFCON) football tournament. The data was retrieved using web scraping via an API provided by Sofascore, which allows access to real-time match statistics. The process involves sending requests to a specific API endpoint, parsing the returned data, and cleaning it for further analysis.
# importing the needed libraries
import requests
import json
import csv
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
%matplotlib inline
import plotly.io as pio
pio.renderers.default = 'notebook'
# Establishing a connection with the website
response = requests.get("https://api.sofascore.com/api/v1/event/11761888")
if response.status_code == 200:
print(response.json()) # This will print the JSON response
else:
print("Failed to retrieve data")
afcon_url = "https://www.sofascore.com/api/v1/unique-tournament/270/season/56021/team-events/total"
response = requests.get(afcon_url)
if response.status_code == 200:
# Parse JSON data
afcon_data = response.json()
print(afcon_data) # Display the raw JSON data for inspection
else:
print(f"Failed to retrieve data: {response.status_code}")
The data needed for the project was extracted from the json file.
# a dictionary to store the match results
matches_data = []
# going through all the values(the matches)
for matches_dict in afcon_data.values():
# doing same for the keys that identify the groups and teams
for group_key, team_games in matches_dict.items():
# going through every game played by the teams
for team_game in team_games:
# finding the number of games played by each team
for game_num in range(len(matches_dict[group_key][team_game])):
# storing every game in a variable and extracting the needed information
game = matches_dict[group_key][team_game][game_num]
match_info = {
'game_id': game['id'],
'Date': game['startTimestamp'],
'group_name': game['tournament']['name'], # Extract group name
'home_team': game['homeTeam']['name'], # Home team name
'away_team': game['awayTeam']['name'], # Away team name
'home_goals_ht': game['homeScore']['period1'], # Home team goals at half-time
'away_goals_ht': game['awayScore']['period1'], # Away team goals at half-time
'home_goals_2nd_half': game['homeScore']['period2'], # Home team goals in the second half
'away_goals_2nd_half': game['awayScore']['period2'], # Away team goals in the second half
'home_goals_ft': game['homeScore']['normaltime'], # Full-time home goals
'away_goals_ft': game['awayScore']['normaltime'], # Full-time away goals
}
matches_data.append(match_info)
# creating a csv file to store the matches
csv_file = 'afcon_group_stage_2024.csv'
# the headers for the csv file
fields = ['group_name','game_id','Date', 'home_team', 'away_team', 'home_goals_ht', 'away_goals_ht',
'home_goals_2nd_half', 'away_goals_2nd_half', 'home_goals_ft', 'away_goals_ft',]
with open(csv_file, mode='w', newline='') as file:
writer = csv.DictWriter(file, fieldnames=fields)
# Write the header
writer.writeheader()
# Write each match's data
for match in matches_data:
writer.writerow(match)
Microsoft excel was used to correct the inaccuracy of the match results. Matches that appeared more than once were removed. Additional columns were also created for the number of goals scored in each half,the result of eacah half and the number of goals scored in the game. The date was changed from the json format to the normal readable format.
afcon_2024_group_stage = pd.read_csv(r"C:\Users\Felix\Documents\python practice\data science projects\AFCON_2024_ANALYSIS\Files\AFCON GROUP STAGE GAMES 2023.csv")
# Trimming the group name
afcon_2024_group_stage['group_name']= afcon_2024_group_stage['group_name'].str.replace(r'^Africa Cup of Nations, ', '',regex=True)
afcon_2024_group_stage['Date'] = pd.to_datetime(afcon_2024_group_stage['Date'],dayfirst=True)
afcon_2024_group_stage.info()
Exploratry Data Analysis¶
# loading a csv file containing first and second half goals of all the teams
first_second_half_goals = pd.read_csv(r"C:\Users\Felix\Documents\python practice\data science projects\AFCON_2024_ANALYSIS\Files\AFCON GROUP STAGE GAMES 2023_FIRST AND SECOND HALF GOALS.csv")
# Summing the first and second half goals
total_first_half_goals = first_second_half_goals['first_half_goals'].sum()
total_second_half_goals = first_second_half_goals['second_half_goals'].sum()
# setting up the values and labels for the pie chart
goals = [total_first_half_goals, total_second_half_goals]
labels = ['First Half Goals', 'Second Half Goals']
A total of 88 goals were scored in the group stages of the tournament. Thirty-three goals were scored in the first half of the games, amounting to about 37% whiles fifty-six goals were scored in the second half also amounting to about 63%.
# A pie chart to show the distribution of goals in the first and second half
fig = px.pie(names=labels, values=goals)
fig.update_layout(title="Goals Scored in First and Second Halves by Teams")
fig.show()
The first half of games averaged 0.9 goals per game. Morocco, Cape verde, Equitorial Guinea and South Africa were the highest scoring teams in the first half of games with three goals each. Cameroon, Mozambique, Gambia and Namibia all failed to register a goal in the first half of all their group stage games.
The second half of games averaged 1.5 goals per game. Senegal and Equitorial Guinea scored the most goals in the second half, six goals for both side. Egypt and Cameroon were the second highest scoring teams in second half with fivr goals each, Cape Verde, Mozambique and Angola also scored 4 goals each in the second half.
# bar chart displaying first and second half goals scored by each team
fig = px.bar(first_second_half_goals, x = 'Team',
y = ['first_half_goals','second_half_goals'],
barmode='group',
labels={'variable' : 'Half','value' : 'Goals','Team' : 'Team'},
color_discrete_sequence=['orange','blue'],
)
fig.update_xaxes(tickangle = 45)
fig.update_traces(texttemplate = '%{y}',
textposition = 'inside',
)
fig.update_layout(title="Goals Scored in First and Second Halves by Teams",
bargap = 0.35,
template = 'seaborn',
height = 500)
fig.show()
Top Scoring Teams in The Group Stages¶
fig = px.bar(first_second_half_goals.sort_values(by='total_goals_scored',ascending=False ), x = 'Team',
y = 'total_goals_scored',
)
fig.update_traces(texttemplate = '%{y}',
textposition = 'inside',
)
fig.update_xaxes(tickangle = 45)
fig.update_layout(title="Goals Scored in The Group Stages",
template = 'seaborn')
fig.show()
Comparison of Goals Scored in The Groups¶
# the number of first and second half goals scored in each group
group_goals = afcon_2024_group_stage.groupby('group_name')[['total_1st_half_goals','total_2nd_half_goals']].sum()
# group_goals = group_goals.sort_values(by='total_2nd_half_goals', ascending=False)
Grouped bar chart to compare the goals scored in the groups¶
fig = px.bar(group_goals,
y = ['total_2nd_half_goals','total_1st_half_goals'], #nthe values for the yaxis
x = group_goals.index, # the labels on the xaxis
barmode='group',
labels={'variable' : 'Half','value' : 'Goals','Team' : 'Team'},
)
fig.update_traces(texttemplate='%{y}', # Use the y-value of each bar as text
textposition='inside')
fig.update_layout(
title='Comparison of First and Second Half Goals Scored in Each Group', # Informative title
xaxis_title='Groups', # Name for x-axis
yaxis_title='Number of Goals', # Name for y-axis
template='seaborn'
)
fig.show()
Stacked bar chart comparing goals scored in the groups¶
fig = px.bar(group_goals,
y = ['total_2nd_half_goals','total_1st_half_goals'],
x = group_goals.index,
barmode='stack',
labels={'variable' : 'Half','value' : 'Goals','Team' : 'Team'},
)
fig.update_traces(texttemplate='%{y}', # Use the y-value of each bar as text
textposition='inside')
fig.update_layout(
title='Comparison of First and Second Half Goals Scored in Each Group', # Informative title
xaxis_title='Groups', # Name for x-axis
yaxis_title='Number of Goals', # Name for y-axis
template='seaborn'
)
fig.show()
# sorting the values of the dataframe using their group names
group_name = afcon_2024_group_stage[['group_name','home_team']].sort_values('group_name')
# making the teams the index of the dataframe
group_name = group_name.set_index('home_team')
# chaning the column with the group names into a dictionary
group_mapping = group_name['group_name'].to_dict()
# adding the group names the dataframe using the map method
first_second_half_goals['group_name'] = first_second_half_goals['Team'].map(group_mapping)
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Create subplots with 2 rows and 3 columns
fig = make_subplots(
rows=2, cols=3,
subplot_titles=first_second_half_goals['group_name'] .unique(),
shared_yaxes=False # Share the y-axis between subplots
)
# Define uniform colors for each goal type
colors = {'1st Half Goals': 'blue', '2nd Half Goals': 'green', 'Total Goals': 'orange'}
# Track whether a legend entry has been added for each category
legend_added = {'1st Half Goals': False, '2nd Half Goals': False, 'Total Goals': False}
# Create bar charts for each group
row = 1
col = 1
# Loop over the unique groups and create the plots
for group in first_second_half_goals['group_name'].unique():
group_data = first_second_half_goals[first_second_half_goals['group_name'] == group]
# Plot for 1st Half Goals
fig.add_trace(
go.Bar(
x=group_data['Team'],
y=group_data['first_half_goals'],
name='1st Half Goals', # Add to legend only once
marker_color=colors['1st Half Goals'],
showlegend=not legend_added['1st Half Goals'], # Only show legend once
legendgroup='First Half'
),
row=row, col=col
)
legend_added['1st Half Goals'] = True
# Plot for 2nd Half Goals
fig.add_trace(
go.Bar(
x=group_data['Team'],
y=group_data['second_half_goals'],
name='2nd Half Goals', # Add to legend only once
marker_color=colors['2nd Half Goals'],
showlegend=not legend_added['2nd Half Goals'], # Only show legend once
legendgroup='Second Half'
),
row=row, col=col
)
legend_added['2nd Half Goals'] = True
fig.update_traces(texttemplate='%{y}', # Use the y-value of each bar as text
textposition='inside')
# Plot for Total Goals
# fig.add_trace(
# go.Bar(
# x=group_data['Team'],
# y=group_data['total_goals_scored'],
# name='Total Goals' if not legend_added['Total Goals'] else None, # Add to legend only once
# marker_color=colors['Total Goals'],
# showlegend=not legend_added['Total Goals'] # Only show legend once
# ),
# row=row, col=col
# legend_added['Total Goals'] = True
# Update row and column positions for the next group
col += 1
if col > 3:
col = 1
row += 1
# Update layout
fig.update_layout(
title_text="Comparison of First and Second Half Goals Scored Across Groups",
height=800,
width = 1200,
showlegend=True,
barmode='group', # Stack bars for each team
template = 'plotly_white'
)
# formatting the labels on the xaxis to make the graph look readable.
# the loop applies the changes to all the graphs in the subplot.
for i in range(1, len(first_second_half_goals['group_name'].unique()) + 1):
fig.update_xaxes(tickangle=45, row=(i - 1) // 3 + 1, col=(i - 1) % 3 + 1)
# formatting the yaxis values for uniformity.
y_axis_range = [0,6]
# a loop that applies the range of values to all the graphs in the subplot
for i in range(1, len(group)+1):
fig.update_yaxes(range=y_axis_range,
row=(i - 1) // 3 + 1,
col=(i - 1) % 3 + 1,
dtick=1)
# Show the figure
fig.show()